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Abstract 
(N 

The Whitehead Minimization problem is a problem of finding elements of the min- 
imal length in the automorphic orbit of a given element of a free group. The classi- 
cal algorithm of Whitehead that solves the problem depends exponentially on the 
group rank. Moreover, it can be easily shown that exponential blowout occurs when 
a word of minimal length has been reached and, therefore, is inevitable except for 
G some trivial cases. 

In this paper we introduce a deterministic Hybrid search algorithm and its stochas- 
tic variation for solving the Whitehead minimization problem. Both algorithms use 
search heuristics that allow one to find a length-reducing automorphism in polyno- 
mial time on most inputs and significantly improve the reduction procedure. The 
stochastic version of the algorithm employs a probabilistic system that decides in 
polynomial time whether or not a word is minimal. The stochastic algorithm is very 
robust. It has never happened that a non-minimal element has been claimed to be 
minimal. 
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1 Introduction 



The Whitehead Minimization problem is a problem of finding elements of 
the minimal length in the automorphic orbit of a given element of a free 
group. This problem is of great importance in group theory and topology and 
continually attracts a great deal of attention from the research community 



Starting from the seminal paper of IWhiteheadl ( 1936T ). the Whiteh ead Mini- 

mizat io n problem was stud ie d ex t ensive l y for more th a n 70 years (see Lyndon and Schuppl 
fll977h:ICohen et all (ll98llhD (l2003h:lKhanl (Eool : iKaoovich et all [1200^ : 
Miasnikov and Shpilrainl (J2005f ): lKaimanovich et al.l ([20051 ) : Kapovichl ((2006)) 
and still the complexity of this problem is unknown. 



One of the most important applications of the Whitehead Minimization prob- 
lem is that its solution is part of the solution to the famous Automorphism 
Problem in free groups introduced by J. H.C. Whitehead in 1936. Methods used 
to solve the Whitehead Minimization problem can be used to decide whether 
an element is a part of a generating basis of a free group. The same meth- 
ods and their genera lizations are used in solving equations over free groups 



sec 



Razborovl (J1985J)). To practitioners, the Whitehead Minimization problem 



could be of interest because of its relati on to non-c ommutative variations of 
the public key cryptographic scheme bv iMohl (1999J). 



All known methods of solving the Whitehead Minimization problem have ex- 
ponential dependence on the rank of a free group. Moreover, the worst case sce- 
nario occurs when solving a termination problem (which is to decide whether 
or not a given element is minimal) for a minimal element. Since the goal of the 
Whitehead Minimization problem is to find a minimal element, the worst case 
is inevitable for almost all elements except for elements of a very particular 
type. This observation leads us to a conclusion that the known deterministic 
techniques are not suitable for groups of large ranks. 



Haralick et^(|200dl27)0^ : lMiasni'kovl(|2004f ): lMiasnikov and Mvasnikovl 1J2004 - 
using methods of pattern recognition and exploratory data analysis, show that 
by introducing proper strategies one can construct a length reduction process 
which is very efficient on most inputs. Furthermore, in these papers we formu- 
late several conjectures (see 2) regarding the various properties of the problem. 

In this paper we present a new algorithm for solving the Whitehead Mini- 
mization problem. It is a hybrid algorithm in a sense that it employs several 
sto chastic, as well as det erministic, procedures based on the conjectures stated 
bv lHaralicket all ifcOOfih . 



We combine a stochastic search algorithm and heuristic search procedures 
(both described in Section 3.1) with the probabilistic classification system 



"recognizing" minimal elements (see Section 3.2) to construct a Hybrid De- 
terministic Whitehead Reduction (HDWR) algorithm solving the Length Re- 
duction Problem in a polynomial number of steps (in terms of group rank) 
on most input words from a free group. The resulting algorithm is determin- 
istic and still requires an exponential number of steps to prove that a word is 
minimal. 

We present a fast probabilistic algorithm HPWRwhich is a slight modification 
of HDWR. Algorithm HPWR is very robust and extremely fast on most input 
words, including words in free groups of large ranks. Although we do not have 
a formal proof of the correctness of HPWR, in all the experiments that we 
have performed it has never happened that the algorithm has produced an 
incorrect output. 

The algorithms HDWR and its probabilistic version HPWR are described in 
Section 3. We give experimental results evaluating the performance of these 
two algorithms in Section 4. Comparison with the standard deterministic pro- 
cedure is also presented in Section 4. 



2 The Whitehead minimization problem 



Let X = {xi, . . . , x n } be a finite alphabet; X" 1 = {x^ 1 | x G X} be the set of 
formal inverses of letters from X and X ±1 = X U X~ l . A word w — y\ . . . y m 
in the alphabet X ±l is called reduced if yi ^ y~+ x for i = 1, . . . , m — 1 (here 
we assume that (rr -1 ) -1 = x). Applying reduction rules xx~ l — > e,x~ l x — »■ e 
(where e is the empty word), one can reduce each word w in the alphabet X ±l 
to a reduced word w. The word w is uniquely defined and does not depend on 
a particular sequence of reductions. Denote by F = F(X) the set of reduced 
words over X ±l . The set F forms a group with respect to the multiplication 
u ■ v = uv, called a free group with basis X. The cardinality \X\ is called the 
rank of F(X). We write F n instead of F to indicate that the rank of F is equal 
to n. 

A bijection : F — > F is called an automorphism of F if <p(uv) = (f>(u)(f>(v) 
for every u,v G F. The set Aut(F) of all automorphisms of F forms a group 
with respect to the composition of maps. Every automorphism G Aut(F) is 
completely determined by the images (f>{x) of elements x G X. Sometimes it 
is more convenient to use non-functional notation w<p to denote the action of 
automorphism on w. 

The following two subsets of Aut(F) play an important part in both group 
theory and topology. An automorphism t G Aut(F) is called a Nielsen auto- 
morphism if for some x E X t fixes all elements y G X,y ^ x and maps x to 



one of the elements x _1 , y x, xy ±1 . Note that automorphisms that map x to 
x" 1 , leaving everything else unchanged, cannot cause alterations of the word 
length. Such automorphisms will be called length invariant automorphisms. 
By N(X) we denote the set of all Nielsen automorphisms of F except the 
length-invariant ones. 

A non-trivial automorphism t G Aut(F) is called a Whitehead automorphism 
if it has one of the following types: 

1) t permutes the elements of -X"* 1 ; 

2) t fixes a given element a G X ±l and maps each element x G X ±l , x 7^ 
a ±1 to one of the elements x, xa, a~ l x, or a~ 1 xa. 

It is easy to see that automorphisms of the first type are length-invariant. By 
W(X) we denote the set of Whitehead's automorphisms of the second type. 
Obviously, every Nielsen automorphism is also a Whitehead automorphism. 

Observe that 

\N(X)\ = An(n - 1), \W{X)\ = 2n4 (n " 1) - 2n 

where n = \X\ is the rank of F. 



It is known ( see iLvndon and Schuppl ( 19771 )) that every automorphism from 



Aut(F) is a product of finitely many Nielsen (hence Whitehead) automor- 
phisms. 

The automorphic orbit Orb(w) of a word w G F is the set of all automorphic 
images of w in F: 

Orb(w) = {v G F I 3lp G Aut(F) such that w(p = v}. 

A word w G F is called minimal (or automorphically minimal) if \w\ < \wip\ 
for any if G Aut(F). By w m i n we denote a word of minimal length in Orb(w). 
Notice that since there may be several elements of the minimal length in same 
orbit, Wmin is not unique in general. 

Problem 2.1 (Minimization Problem (MP)) For a word u G F find an 
automorphism <p G Aut(F) such that uip = u min . 



Whitehead! ( 19361 ) proved the following result which gives a solution to the 



minimization problem. 

Theorem 2.1 (Whitehead) Let u G F n (X). If \u\ > \u min \, then there ex- 
ists t G W(X) such that 

\u\ > \ut\. 



An automorphism G Aut(F) is called a length-reducing automorphism for 
a given word u G F if |w0| < \u\. The theorem above claims that the finite set 
W(X) contains a length-reducing automorphism for every non-minimal word 
u G F . This allows one to design a simple search algorithm for (MP). 

Let u G F. For each t G W(X) compute the length of the word ut until 
\u\ > \ut\, then put t\ = t,ui = ut\. Otherwise stop and output u min = u. 
This procedure is called the Whitehead Length Reduction routine (WLR). Now 
Whitehead Reduction (WR) algorithm proceeds as follows. Repeat WLR on 
u, and then on the resulting Ui, and so on, until at some step k WRL gives 
an output u min . Then ut\ . . . t k _i = u min , so <fi — t\ . . . t k _i is the required 
automorphism. 

Notice, that the iteration procedure WR simulates the classical greedy descent 
method (ti is a successful direction from u\ t 2 is a successful direction from U\\ 
etc.) Theorem 2.1 guarantees that the greedy approach will always converge 
to a global minimum. 

Clearly, there could be at most \u\ repetitions of WLR on an input u G F 

\u\ > \ut\\ > ... > \lltl...tl\ = Umin, I < \u\. 

Hence the worst case complexity of the Whitehead's algorithm is bounded 
from above by 

cA n \u\ , 

where A n = 2nA^ n ~ 1 ^ — 2n is the number of Whitehead automorphisms in 
W(X) and the constant c is a stretching factor by which the length of a word 
increases after a Whitehead automorphism is applied (ignoring the low level 
implementation details.) One letter can be mapped into a word of length of at 
most 3, so c is bounded by 3 and does not depend on the rank of a group or the 
word's length. Since A n depends exponentially on the rank of a free group, in 
the worst case scenario the algorithm seems to be impractical for free groups 
with large ranks. One can try to improve on the number of steps which it takes 
to find a length-reducing automorphism for a given non-minimal element from 
F. In this context, the question of interest is the complexity of the following 

Problem 2.2 (Length Reduction Problem(LRP)) For a given non-minimal 
element u G F find a length-reducing automorphism. 



We refer to lHaralick et al.l (120051) ; iM iasni kov and Mvasnikovl ([20041 ) for a gen- 
eral discussion of this problem. lHaralick et al.l (J2005J) offers some empirical ev- 



idence that by using smart strategies in selecting Whitehead automorphisms 
t G W(X) one can dramatically improve the average complexity of WLR in 
terms of the rank of a group. Some of the experimental results were formulated 
as the following conjectures: 



Conjecture 2.1 ( Haralick et al . (2005y) LetUk be the set of all non-minimal 



elements in F of length k and NUk C Uj. the subset of elements which have 
Nielsen length-reducing automorphisms. Then 



lim = 1. 

fc-^oo c/fc 



Conjecture 2.2 ( Haralick et al. (2005)) The feature vectors of weights of 



the Whitehead Graphs of elements from F are separated into bounded regions 
in the corresponding space. Each such region can be bounded by a hypersur- 
face and corresponds to a particular Nielsen automorphism in a sense that all 
elements in the corresponding class can be reduced by that automorphism. 



Arguably the conjectures above are not intuitive and most likely would have 
been difficult to arrive at without observations obtained using computer ex- 
periments. At this point we would like to mention a new developm ent in this 



area w hich was not available during the submission of this paper. iKapovich 



( 20061 ) has recently posted a preprint, giving a mathematical proof of the Con- 



jecture 2.2. To our best knowledge this is the first time non-trivial conjectures 
were obtained using statistical and exploratory data analysis techniques. 



Unfortunately, one can easily see that the worst case behavior of the algorithm 
WR occurs when a word of minimal length has been reached. Except for some 
trivial cases (when a minimal word is a generator, for example) all Whitehead 
automorphisms need to be applied to a minimal word before we can conclude 
that it is, indeed, minimal. It seems that no algorithm is known to avoid 
time-consuming computation in this case. We would like to emphasize the 
importance of this fact by formulating it as a separate problem: 



Problem 2.3 (Minimal Word Classification Problem(MWCP)) For a 

given u G F{X), decide whether u is minimal or not. 



We discuss this problem in the previous papers. lHaralick et al. (12004 ) gives a 



probabilistic solution which is based on regression models. iMiasnikovl ([20041) 
used the so-called support vector machines to improve the performance in free 
groups of large ranks. In this paper we introduce a new, significantly more 
efficient probabilistic system based on the empirical distribution of minimal 
elements in the corresponding feature space (see Section 3.2). 



3 Description of the Hybrid algorithms 

3.1 Heuristics for the Length Reduction Problem 

We have addressed this problem in the preceding papers. Our first approach 
was to develop a simple Stochastic Whitehead Reduction (SWR) algorithm to 
solve LRP. It is implemented as a combination of a greedy descent procedure 
with genetic search techniques. 

Define the search space S as the set of all finite sequences 

H=<t\,...,t s > 

of Whitehead automorphisms ij G W{X). For such \i and a word u G F define 

Ufi = ut\ . . . t s . 

The solution to LRP is any sequence /i* G S such that 

\u/j,*\ < \u\. 

Among all such solutions we prefer the ones that give maximal length reduc- 
tion of the image. In SWR, we define the criterion function which evaluates a 
solution fi as 

^(/i) = \ufi\. 
The details on the implementati on and evaluation of SWR can be found in 



Miasnikov and Mvasnikovl ([20041 ) . 



To our great surprise, this naive stochastic algorithm significantly outper- 
formed the standard algorithm, especially in free groups of large ranks. For 
example, there were very few runs of WR for words w G F w with \w\ > 100 
that finished within an hour and there were no such runs for \w\ > 200. Nev- 
ertheless, the stochastic algorithm still was able to find minimal words in a 
matter of seconds. What seemed to be more important is that the stochastic 
algorithm did not show exponential dependence on the group's rank. 

We strongly believe that if a stochastic algorithm performs very well, then 
there must be a purely mathematical reason behind this phenomenon which 
can be uncovered by a proper statistical analysis. Following this philosophy, we 
performed an analysis of successful solutions produced by SW R. The results 



helped us to define a number of search heuristics described by lHaralick et al. 
(2005). Below we give a brief description of these heuristics. 



First, we observed that among all Whitehead automorphisms in the successful 
solutions, Nielsen automorphisms statistically had a greater chance to occur. 
Further experiments showed that more than 99% of non-minimal elements can 



be reduced by one of the Nielsen automorphisms. Our first heuristic is based 
on this observation and simply suggests trying Nielsen automorphisms first 
in the routine WLR, i.e., in this case we assume that in the fixed listing of 
automorphisms of W(X), the automorphisms from N(X) come first. We refer 
to this heuristic as Nielsen First. Note, that the Nielsen First heuristic is very 
general and does not use information about the input word itself. We showed 
that one can significantly improve the performance of the search procedure by 
incorporating heuristics that use some knowledge about the input. 

Let u G F(X). The undirected Labeled Whitehead graph W(u) = (V,E(u)) 
of the word u is a complete undirected graph, where the set of vertices V is 
equal to the set X ±1 . Every edge e = (x, y) , x ^ y of the Whitehead graph 
is assigned a weight u e = n e /\u\, where n e is the number of times subwords 
xy~ x or yx^ 1 occur in u. Note that u e = if the subwords corresponding to 
the edge e do not occur in v. Now, for a given word u G F define a special 
vector representation (called a feature vector) f(u) G IRl 5 *")! such that 

/(«) =< L> ei ,...,(V e]E(u)] > . 

The edges e* are assumed to be taken in some fixed order. Since the Whitehead 
graph is complete, the number of edges and, therefore, the size of feature 
vectors is 3n 2 — n for all elements in a free group F n . The set of all feature 
vectors is usually called a feature space and is denoted by T . 

Experiments show that there is a correlation between the location of the fea- 
ture vectors in the corresponding space and the length-reducing Nielsen auto- 
morphisms. 

Let t G N(X) be a Nielsen automorphism. Define the set 

O t = {w | r G N(X) and \wr\ < \w\ <^=^ r = t} 

as a set of all elements that can be reduced only by t and no other Nielsen 
automorphism. We also define a set B mt C O t : 

B m ,t = {w \ w e O t , \w\ < to}, 

which is a finite set of elements from O t with the length of at most to. For a 
large to we define 

\ n m,t\ w( z Bm , t 

as an estimate of the mean feature vector of the elements in Ot- 



Now, let 



d(w,t) = \\f(w)-X t \ 



be the distance (in this case, Euclidean distance) between the feature vector of 
a given word w and the estimate A* of the mean feature vector corresponding 
to the Nielsen automorphism t. 



Haralick et al.l ( 20051 ) experimentally show that in about 99% of the time 



a randomly generated non-minimal element w can be reduced by a Nielsen 
automorphism t* such that 

d{w,t*) = mm{d{w,t) | t G N(X)}. 



Now we define the second heuristic, which is called the Centroid heuristic. For 
a given word u compute distances d(u, t) for all t G N(X) and sort them in 
the increasing order: 

d(u,ti) < d(u,t 2 ) < ... < d(u,t k ). 

Apply automorphisms tx,...,tk sequentially until a length-reducing Nielsen 
automorphism (if any) is found. 

Now let e = (x, y _1 ), x, y^ 1 G V be an edge in the Whitehead graph of a word 
w such that x ^ y. By construction of the Whitehead graph, e corresponds to 
subwords s e = {xy) ±l . 

There are only two Nielsen automorphisms that reduce lengths of subwords 
in s e : 

ipe : x — > xy' 1 , z — » z Vz ^ x 

and 

i> y e :y -> x~ l y, z -»• zMz^y. 

Denote ip e = {ip* , i/j^} . We call automorphisms ip e the length reducing with 
respect to the edge e. 

We can order Nielsen automorphisms ip ei C N(X): 

<ip ei ,ip e27 ...,ip ek > (1) 

such that the corresponding edges e 1; . . . , e^ are chosen according to the de- 
creasing order of the values of their weights 

loiex) > u{e 2 ) > ■■■> uj(e k ). 



In the third heuristic we apply Nielsen automorphisms in the order given 
by (1). This h e uristi c is called the Maximal Edge heuristic. In the paper 



( Haralick et al.l (J2005D) we present empirical evidence that most non-minimal 



elements can be reduced by one of the automorphisms in ip ei , given that i^(ei) 
is maximal. 
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Table 1 

99th percentile of the number of Nielsen automorphisms computed for different 
heuristics applied to test sets 5i and Sio in free groups F3, Fa and F5. 

To get a better understanding of how effective these methods are, we estimate 
the 99th percentile of the number of Nielsen automorphisms required to reduce 
a non-minimal word from a given test set (see Table 1). 

We can see that the Centroid heuristic is the most effective and is able to 
predict a length reducing automorphism with very high accuracy. The Maxi- 
mal Edge heuristic also uses very few automorphisms where the Nielsen First 
heuristic must apply at least 60% of Nielsen automorphisms in the best case. 

The Nielsen First heuristic does not require any additional computations and, 
therefore, its computational complexity is of order 0(1). The Maximal Edge 
heuristic requires 0(|u>|+n 2 ) steps. The Centroid heuristic requires 0(|w|+n 4 ) 
elementary steps, where n is the rank of a free group. Therefore, one becomes 
aware of a tradeoff between the length of the input word and the rank of a 
group. Since the Centroid and Maximal Edge heuristics are more accurate they 
become more attractive when the length of the input word increases because 
fewer superfluous automorphisms will be applied to the input word. 

In Section 3.3 we show how the stochastic algorithm SWR can be combined 
with Centroid and Maximal Edge heuristics to improve the solution of the 
Whitehead Minimization problem. 



3.2 Probabilistic System for Classification of Minimal Words 



We have already mentioned that the worst case of the standard Whitehead al- 
gorithm applied to solve LRP occurs when the word is already minimal. Note 
that exponential blowout is inevitable in the WR algorithm, unless minimal 
words are of a very special type. Being able to solve the Minimal Word Classi- 
fication Problem efficientl y is crucial for an effici ent s olution to the Wh itehead 
Minimization Problem. In lHaralick et al.l ( 20041 ) and iMiasnikovl ( 20041 ) we de- 
scribe several stochastic classification systems (classifiers) based on pattern 
recognition techniques such as regression and support vector machines. These 
classifiers are able to decide whether a given word is minimal in polynomial 



10 



time (with respect to group rank) with a very small error of misclassification. 



Conclusions in JMiasnikovl ( 2004? ) suggest that one of the classes (minimal or 



non-minimal) of elements could be located in a compact region in the feature 
space T and can be bounded by a hypersurface. 

To support this conjecture we perform the following experiment. Assume that 
the feature vectors of minimal elements follow the multivariate normal distri- 
bution J\f(p, E) with the mean p and the covariance E. We estimate p and E 
from a set of randomly generated minimal elements. Experiments show that 
more than 97% of minimal elements lie inside the hyperellipse, corresponding 
to the 99.9% confidence interval for p. Moreover, no non-minimal elements fall 
inside that region. This is a very strong indication that the feature vectors of 
minimal elements indeed lie compactly in T. 

Using this result we construct a new probabilistic system WMIN to solve 
the Minimal Word Classification Problem. We decide that a given word u is 
minimal if its feature vector f(u) falls inside the corresponding hyperellipsoid 
and we decide that u is otherwise non- minimal. To be more precise, let p 
and E be, respectively, the mean and the covariance matrix of feature vectors 
of minimal elements. Using the so-called Mahalanobis distance we define the 
decision rule: 

{minimal, if (x — p) T T l ^ 1 (x — p,) < p; 

non — minimal, otherwise 

where (x — p) T is the transpose of a column vector x — p. One way of esti- 
mating the threshold p was indicated above, where it was taken to correspond 
to the 99.9% confidence interval of p, given that feature vectors follow multi- 
variate normal distribution. However, in this case the error of misclassifying 
minimal elements is unacceptably large (greater then 5%). This indicates that 
the feature vectors actually are not normally distributed. 

A practical way to estimate p is to estimate the distribution of distances from 
feature vectors of minimal elements to their mean. Then we take p such that 
100(1 — a) percent of minimal elements have distances less than p for a given a. 
Note that a corresponds to a confidence level in a non-parametric hypothesis 
testing. 

To compute Mahalanobis distance we need to obtain p and E. One way is 
to estimate them from a set of randomly generated minimal elements. This 
process is usually called "training" the classifier. Unfortunately, to generate the 
sample of minimal elements we require to solve the length reduction problem 
which, as we have argued, is hard in groups of large ranks. Below we suggest 
a more efficient training procedure. 
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Kapovich et al.l ([20041 ) show that a random cyclically reduced element in a free 
group is minimal with asymptotic probability 1. It is also easily shown that 
any minimal element is already cyclically reduced. Following these two facts, 
we suggest estimating /i and £ from a set of randomly generated elements of 
a whole free group. T his can be implemented ve r y effic iently even in groups 



with large ranks (see iMiasnikov and Mvasnikovl (J2004J) for more details on 



generating random elements in a free group). 

We would like to mention here the two kinds of errors that may occur when 
solving the Minimal Word Classification Problem. The first is the error of 
classifying a non-minimal element as minimal. It is called the false positive 
error. The second is the error of classifying a minimal element as a non-minimal 
element. It is called the false negative error. The rate of the false positive 
error in all our experiments was zero. This property of the classifier is very 
important for a successful implementation of the probabilistic version of the 
hybrid algorithm. We will return to this discussion when we describe HPWR 
in Section 3.4. 



3.3 HDWR 



The deterministic hybrid procedure HDWR is given in Figure 1. The algorithm 
contains two major parts. The first part consists of a number of so-called fast 
checks - linear or polynomial procedures that can solve the length reduction 
problem on some inputs. In fact, the fast checks used in HDWR are expected to 
reduce most non-minimal elements in a free group. The problem is that there 
are non-minimal words which cannot be reduced by fast procedures. Using 
the fast checks alone, one cannot decide whether an input word is minimal or 
not. We need to provide a termination condition of the algorithm. This task is 
solved by the second part of the algorithm which is a version of the standard 
deterministic algorithm WR. Note that in most cases, the computationally 
ineffective procedure WR is expected to be executed only on minimal elements. 

HDWR is an iterative procedure. On each iteration, the length reduction prob- 
lem is solved for the word w c , which is an automorphic image of the minimal 
length of the input word w found so far. The algorithm terminates when there 
are no reductions possible and the current word w c is returned as a minimal 

word Wmin- 

The first step in the algorithm is the classification procedure WMIN (line 5) 
which decides whether a current word is minimal or not. Even though reduc- 
tion procedures used as fast checks do not require significant computational 
resources, this step helps avoiding superfluous computations by distinguishing 
minimal elements on the first stage of the algorithm. 
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DETERMINISTIC HYBRID ALGORITHM 



9 

10 

11 

12 

13 

14 

15 

16 

17 



SET the current word w c ; reduced = true; 
WHILE reduced BEGIN 
reduced = false; 
/* Begin fast checks */ 
IF w c NOT classified as minimal BEGIN 
IF i^e max (Maximal Edge) reduces w c 

w c = reduced word; reduced = true; 
ELSE IF Centroid reduces w c 

w c = reduced word; reduced = true; 
ELSE IF Stochastic Algorithm reduces w c 
w c = reduced word; reduced = true; 
END IF 

/* End fast checks */ 
IF NOT reduced AND W(X) reduce w c 
w c = reduced word; reduced = true; 
END WHILE 
STOP; 



Fig. 1. Algorithm HDWR. 

Fast reduction procedures are based on the search heuristics described in Sec- 
tion 3.1. We use the Maximal Edge heuristic as the first fast check because 
it requires the least number of steps (0(n 2 + |iu|)) when compared to other 
methods. Moreover, n 2 part appears when we construct the Whitehead graph 
which is required for all heuristics. Note that more than 90% of non-minimal 
elements are expected to be reduced using one of the two automorphisms 
corresponding to the maximal weight edge of WG(w c ). 

Let e max (w c ) be the maximal edge in WG{w c ) and ip emax be the set of two 
length- reducing automorphisms with respect to the edge e max (w c ). On line 6 
of the algorithm HDWR we apply automorphisms 4 ! e rnax to w c . The maximal 
number of steps required to perform the fast check is 

0(n 2 + \w c \). 

These steps include the construction of the feature vector fwG{w c ) an d the ap- 

plication of the automorphisms ip emax ■ Following the observations from lHaralick et al 
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(2005) we expect most non-minimal elements to be reduced at line 6. If the 



word w c has been reduced by one of the two automorphisms, say ^' emax , we 
substitute w c with if)' e (w c ) and start a new iteration. If the Maximal Edge 
check fails, we continue the reduction process utilizing the next step. 

As the next fast check we use the Centroid heuristic. Let ip emax be the set 
of two automorphisms applied at line 6 and N(X) be the Nielsen. We order 
automorphisms 

<(p 1 ,...,tp k >, (fieN n - tp emax , (2) 

such that 

d(w c , (ft) < d{w c , ip 2 ) < ■ ■ ■ < d(w c , (fk). 

We apply automorphisms tp 1 , . . . , (p k in the order given by (2) to the word w c . 
If one of the automorphisms has reduced the length of w c , we stop and start a 
new iteration with the new, reduced word w c . The maximal number of steps 
required to execute this fast check is 

0(n 4 + n 2 \w c \). 

By line 10 we already know that the word w c does not have Nielsen length- 
reducing automorphisms. The suggested strategy at this point is to try to 
reduce w c by executing SWR for a predefined number of generations. If SWR 
fails to find a length-reducing automorphism, we continue with the algorithm 
WR. A very conservative bound for the expected maximal number of gen 



orations of the stochastic algorithm was given in Miasnikov and Mvasnikov 



( 20041 ) . Note that since algorithm SWR performs better than WR only in 
groups with relatively large ranks (greater then 5), it might happen that per- 
formance improves if we omit step 10 when the rank of a free group is small. 

The maximal time complexity to find a length-reducing automorphism for w c 
using HDWR is still 

0(2> c |). 
However, following the discussion in Section 3.1, we expect the length reduc- 
tion process to be extremely efficient for most non-minimal words. Unfortu- 
nately, as previously stated, the worst case behavior of the algorithm occurs 
when the current word w c becomes minimal and, therefore, it is inevitable 
except for some trivial cases. In the next section we introduce a probabilistic 
algorithm that addresses this problem. 



3.4 HPWR 



Words of both types, minimal and non-minimal, may cause an exponential 
blowout in the algorithm WR. However, non-minimal words do not seem to 
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STOCHASTIC HYBRID ALGORITHM 

1: SET the current word w c ; reduced = true; 

2: WHILE reduced BEGIN 

3: reduced = false; 

4: IF ipe max (Maximal Edge) reduces w c 

5: w c = reduced word; reduced = true; 

6: ELSE IF Centroid reduces w c 

7: w c = reduced word; reduced = true; 

8: ELSE IF w c classified as minimal 

9: STOP; 

10: ELSE IF Stochastic Algorithm reduces w c 

11: w c = reduced word; reduced = true; 

12: END WHILE 

13: STOP; 

Fig. 2. Algorithm HPWR. 

be a major problem since we have shown that most of them can be reduced 
by one of the Nielsen automorphisms. 

The bottleneck in solving the length reduction problem occurs in the lack 
of a fast algorithm to decide whether a word is minimal or not. In fact, the 
only known deterministic solution is the algorithm WR itself. Recall that the 
worst case of the algorithm occurs when the input word is already minimal. In 
this case all of the Whitehead automorphisms have to be applied to the word 
before the decision that the word is minimal can be made. 

In this section we introduce a Hybrid Probabilistic Whitehead Reduction al- 
gorithm HPWR for solving Whitehead's Minimization problem. In HPWR the 
decision on whether or not a word is minimal is made using a probabilistic 
classification system WMIN. This allows one to avoid the exponential blowout 
for the cost of a possibility of a very small classification error. 

We construct HPWR from HDWR first by removing the last step (line 14) 
from the algorithm (see Figure 2). Note the increased role the stochastic al- 
gorithm SWRplays. This is the only method in HPWR which is capable of 
reducing non-minimal elements that do not have Nielsen length-reducing au- 
tomorphisms. 
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Secondly, we move the classification step behind the fast reduction procedures. 
To explain this modification we would like to return to the discussion of the 
roles played by the two types of classification errors of the classifier WMIN in 
the view of its new application. 

Recall that the two errors are: the false positive error (classifying a non- 
minimal element as minimal) and the false negative error (classifying a mini- 
mal element as a non- minimal element). Now observe that once the classifier 
WMIN decides that the word w c is minimal, algorithm HPWR terminates 
and returns w c as the result. There is no backtracking or additional checking 
performed after the decision is made. This means that if a non- minimal word 
w c is classified as minimal the algorithm will produce an incorrect result. On 
the contrary, when a minimal word is misclassified as non-minimal, the cost 
of such error is the number of extra computational steps performed by the 
algorithm in order to reduce a non-reducible word. What is important is that 
the algorithm still produces a correct result. 

Let e be the probability of committing the false positive error by WMIN. Now 
assume that during the reduction process classifier WMIN was called k times 
to decide whether an element w c is minimal or not. The probability that the 
algorithm terminates with a correct answer is (1 — e) k . This shows that the 
probability of giving an incorrect answer grows rapidly with the number of 
times the minimality decision is made. 

Note that most of the reductions are expected to be done by fast check pro- 
cedures. Moving the classification step behind the fast checks allows us to 
reduce the error of producing an incorrect answer while still maintaining a 
small computational cost on average. 

The arguments above show that the false positive error of the classifier has 
crucial importance. It is necessary to keep the rate of the false positive error 
as minimal as possible in order for the algorithm to perform correctly. 

It has been noted in Section 3.2 that the error of misclassifying non-minimal 
elements was zero in all our experiments and, therefore, we expect it to be 
very small in all instances. 



4 Evaluation 



In this section we evaluate the algorithms HDWR and HPWR and compare 
their performance to the performance of the algorithm WR. 

We evaluate these algorithms on the following test sets of randomly generated 
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Dataset 


Size 


Min. length 


Avg. length 


Max. length 


Std. deviation 


Si 


10143 


3 


605.8 


1306 


359.3 


5*10 


2535 


3 


1507.9 


13381 


1527.9 


s P 


5645 


3 


1422.1 


143020 


5379.0 


a)F 3 ; 


Dataset 


Size 


Min. length 


Avg. length 


Max. length 


Std. deviation 


Si 


10176 


4 


629.3 


1366 


374.9 


Sio 


2498 


5 


2273.7 


34609 


2679.1 


s P 


5741 


4 


4785.3 


763650 


19266.4 


b)F 4 ; 


Dataset 


Size 


Min. length 


Avg. length 


Max. length 


Std. deviation 


Si 


10165 


5 


650.6 


1388 


385.4 


5*10 


2566 


7 


2791.1 


28278 


3234.9 


s P 


3821 


5 


2430.5 


160794 


6491.0 



c)F 5 ; 
Table 2 
Description of the test sets of non-minimal elements in free groups F%, F4 and F§. 

cyclically reduced non-minimal elements: 

Si', contains minimal and non- minimal elements in equal proportions. Non- 
minimal elements are obtained with one Whitehead automorphism. 

Sp: set of pseudo-randomly generated primitive elements in F. Recall, that 
w G F(X) is primitive if and only if there exists an automorphism a G 
Aut(F) such that a(w) G X ± . 
Sio'. generated similarly to Si, but up to 10 automorphisms are used to gen- 
erate non-minimal elements. 

Some characteristics of the sets in free groups F 3 , F 4 and F 5 are given in Table 
2. 

Let A be one of the algorithms WR, HDWR or HPWR. By an elementary step 
of the algorithm A, we mean one application of a Whitehead automorphism 
to a given word. Below we evaluate the performance of A with respect to the 
number of elementary steps. 

Let N s = N S (A, S) be the average number of elementary steps required by A 
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to find a minimal element for a given input w G S, where S C F n is a test set. 

By N re d = N re d(A, S) we denote the average number of elementary length- 
reducing steps required by A to reduce a given element w G S to a minimal 
one, so iV re rf is the average number of "productive" steps performed by A. It 
follows that if t±, ... ,ti are all the length reducing automorphisms found by 
A when executing its routine on an input w G S then \wt\ . . .t\\ = \w m i n \ and 
the average value of / is equal to N re d. 

We use values N s and N re( i as measures evaluating the performance of the 
algorithms. In addition we record the CPU time T(w) spent by an algorithm 
to produce a solution for a particular word w. Since HPWR is a probabilistic 
algorithm there exists a possibility of producing an incorrect solution. We 
measure the error of a probabilistic Whitehead reduction algorithm A by 
computing the fraction of elements for which A failed to return a minimal 
element. Let Sol^(w) G F n be a solution produced by algorithm A. If result 
is correct, then \Sol^(w)\ = \w min \. The error rate of A with respect to the 

test set S 

\{w G S | \Sol A (w)\ > \w min \}\ 
E{A) = ^ . 

In all the experiments we have done with the stochastic algorithm HPWR the 
error rate was zero, i.e. it has never happened that a non-minimal element has 
been claimed to be minimal. 

First, we experiment with groups of smaller ranks. For elements in free groups 
F$, F4 and F$, algorithm WR can decide in a practically acceptable amount 
of time on whether an element is minimal or not. This allows us to obtain 
the true values of lengths of minimal elements for each of the input words 
and access the error rate of probabilistic algorithms. Results are presented in 
Tables 3-5, where 

Tavg = 7^7 J2 T ( w ) 
l D l wes 

and S is the corresponding test set. 

From the tables we can see that both algorithms, HDWR and HPWR, sig- 
nificantly outperform WR on the sets of primitive elements with the error 
of HPWR being small (actually zero). This shows that the fast checks are 
efficient reduction heuristics. The same picture holds for other sets as well. 
However, the performance of HDWR deteriorates on sets Si and Siq, where 
it is much more difficult to decide whether or not an element is minimal. 
We have already mentioned that in the case of a minimal element all of the 
Whitehead automorphisms must be applied to confirm that it is indeed min- 
imal. The sizes of the sets of Whitehead elementary automorphisms in free 
groups F 3 , F 4 and F 5 are |Q 3 | = 90, |fi 4 | = 504, |Q 5 | = 2550 respectively. 
From the tables 4 and 5 we can see that the values of N s in all cases is just 





N s 


N re d 


J- avg i S 


A 


mean 


std 


mean 


std 


mean 


std 


WR 


360.2 


267.5 


27.3 


18.5 


0.11 


0.46 


HDWR 


41.2 


29.1 


24.2 


15.0 


0.01 


0.05 


HPWR 


41.1 


29.6 


24.2 


15.0 


0.01 


0.05 



a) Fs 





N s 


N re d 


-* avg j S 


A 


mean 


std 


mean 


std 


mean 


std 


WR 


2679.7 


2356.5 


57.5 


37.3 


2.03 


8.75 


HDWR 


118.1 


114.8 


45.5 


28.0 


0.08 


0.31 


HPWR 


118.3 


117.1 


45.5 


28.0 


0.07 


0.26 



b)F 4 ; 





N s 


* *red 


•* avg > S 


A 


mean 


std 


mean 


std 


mean 


std 


WR 


16319.9 


20284.53 


79.3 


52.6 


5.52 


16.4 


HDWR 


276.5 


539.4 


58.9 


38.6 


0.12 


0.29 


HPWR 


239.2 


324.8 


58.0 


35.6 


0.08 


0.16 



c)F 5 ; 
Table 3 

Comparison of algorithms WR, HDWR and HPWR on the test sets of primitive 
elements Sp in free groups F$, F4 and F5, where N s is the average number of 
elementary steps to find a minimal element, N re< i the average number of length- 
reducing steps, T avg is the average time (in seconds) spent on an input. 



a little greater than the size of Q n . This indicates that HDWR spends most 
of its automorphisms and, therefore, time, on elements of minimal length. On 
the contrary, algorithm HPWR seems to be able to avoid exponential blowout 
by quickly recognizing minimal elements using the classifier WMIN. Note that 
N s computed for HPWR is smaller than \Q n \ in all experiments. 

To show that algorithm HPWR is applicable to groups of large ranks, we 
perform experiments with primitive elements in free groups F w , F 15 and F 2 q 
(see Table 6). We can see that HPWR was able to find solutions quickly with 
N s growing very slowly with the rank. 
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N s 


N re d 


■* avg i 8 


A 


mean 


std 


mean 


std 


mean 


std 


WR 


129.0 


33.1 


2.14 


1.15 


0.05 


0.03 


HDWR 


117.5 


10.4 


1.69 


0.81 


0.04 


0.03 


HPWR 


53.6 


142.4 


1.70 


0.82 


0.011 


0.01 


a)F 3 ; 




N s 


N re d 


J- avg i S 


A 


mean 


std 


mean 


std 


mean 


std 


WR 


734.9 


188.3 


3.24 


1.72 


0.30 


0.20 


HDWR 


552.1 


61.1 


2.42 


1.19 


0.21 


0.12 


HPWR 


140.7 


387.9 


2.43 


1.29 


0.02 


0.06 


b)F 4 ; 




N s 


^red 


■* avg j "5 


A 


mean 


std 


mean 


std 


mean 


std 


WR 


3541.3 


908.2 


4.28 


2.19 


1.45 


1.05 


HDWR 


2601.6 


341.7 


3.29 


1.73 


0.70 


0.41 


HPWR 


316.8 


895.2 


3.29 


1.73 


0.05 


0.06 



c)F 5 ; 

Table 4 

Comparison of algorithms WR, HDWR and HPWR on the test sets Si in free 
groups i*3, i*4 and F5, where N s is the average number of elementary steps to find 
a minimal element, N re( i the average number of length-reducing steps, T avg is the 
average time (in seconds) spent on an input. 

5 Conclusion 



The search heuristics described in lHaralick et al.l ( 20051 ) can be successfully 
applied for solving the Whitehead Reduction problem. Probabilistic algorithm 
HPWR is very robust and can be used in groups with large ranks whereas any 
other known algorithm fails to produce similar results due to the fact that 
the worst case is inevitable for most inputs. The computational advantage 
of HPWR increases when the rank of a free group increases. Indeed, HPWR 
performs about 11 times faster than WR in F 3 and more than 60 times faster 
than in F5. 
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N s 


N re d 


J- avg i S 


A 


mean 


std 


mean 


std 


mean 


std 


WR 


203.5 


86.5 


8.45 


5.49 


0.12 


0.13 


HDWR 


124.5 


13.5 


6.54 


3.94 


0.05 


0.03 


HPWR 


64.7 


153.9 


6.54 


3.94 


0.02 


0.01 


a)F 3 ; 




N a 


N re d 


J- avg i S 


A 


mean 


std 


mean 


std 


mean 


std 


WR 


1278.6 


527.5 


17.1 


10.7 


1.01 


0.65 


HDWR 


569.7 


67.5 


11.6 


7.16 


0.36 


0.33 


HPWR 


172.0 


416.0 


11.6 


7.16 


0.04 


0.03 


b)F 4 ; 




N s 


N re d 


■* avg i S 


A 


mean 


std 


mean 


std 


mean 


std 


WR 


7650.5 


4468.0 


27.1 


17.8 


5.87 


8.81 


HDWR 


2650.5 


342.9 


17.1 


10.8 


1.06 


0.63 


HPWR 


360.7 


904.5 


16.9 


10.6 


0.08 


0.08 



Table 5 

Comparison of algorithms WR, HDWR and HPWR on the test sets Sio in free 
groups -F3, -F4 and F5, where N s is the average number of elementary steps to find 
a minimal element, N re d the average number of length-reducing steps, T avg is the 
average time (in seconds) spent on an input. 
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N s 


N re d 


•* avg i S 




mean 


std 


mean 


std 


mean 


std 


^10 


595.2 


9195.5 


55.5 


37.0 


0.20 


0.58 


*15 


671.1 


883.1 


106.1 


55.5 


1.03 


0.59 


-^20 


736.3 


874.4 


128.4 


61.8 


2.80 


1.41 



Table 6 

Performance of the algorithm HPWR on sets of primitive elements in free groups 
^10) -^15 an d -^20) where N s is the average number of elementary steps to find a 
minimal element, N rec i the average number of length-reducing steps, T avg is the 
average time (in seconds) spent on an input. 
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